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Abstract 

In this paper we consider non parametric finite translation mixtures. We prove that 
all the parameters of the model are identifiable as soon as the matrix that defines the joint 
distribution of two consecutive latent variables is non singular and the translation parameters 
are distinct. Under this assumption, we provide a consistent estimator of the number of 
populations, of the translation parameters and of the distribution of two consecutive latent 
variables, which we prove to be asymptotically normally distributed under mild dependency 
assumptions. We propose a non parametric estimator of the unknown translated density. In 
case the latent variables form a Markov chain (Hidden Markov models), we prove an oracle 
inequality leading to the fact that this estimator is minimax adaptive over regularity classes 
of densities. 

Keywords: translation mixtures; non parametric estimation; semi-parametric models; 
Hidden Markov models, dependent latent variable models. 

Short title: Non parametric finite translation mixtures 



1 Introduction 



Finite mixtures are widely used in app lications to model heterogene ous data and to do un- 
supervised clustering, see for instance iMacLachlan and Peell (2000) or iMarin et al.1 (|2005T) 
for a review. Latent class models, hidden Markov models or more generally regime switch- 
ing models may be viewed as mixture models. Finite mixtures are therefore to be un- 
derstood as convex combinations of a finite number of probability distributions over the 
space the data lives in, including both static (when the latent variables are independent) 
and dynamical models. Most of the developed methods use a finite dimensional descrip- 
tion of the probability distributions, which requires some prior knowledge of the phe- 
nomenon under investigation. In particular applications, it has been noticed that this 
may lead to poor results and various extensions have been considered. The first natu- 
ral extension is to consider mixtures with an unknown number of components. This has 
been extensively st udied a n d use d in the literature both from a Bayesian or fre q uentis t 
point of view, see Akaikel (Il973fo. iRichardson and Green ( 1997). Ilshwaran et al.l (12001 ). 
Chambaz and Rousseau! (|2008h . Chambaz et al.l f|2009h . Gassiat and van Handell (jpeaiu .to 
name but a few. However when the emission distribution, i.e. the distribution of each 
component, is misspecified this re sults in an over estimation of the number of components, 
as explained in the discussion in iRabineJ (1989). Thus, there has recently been interest 
in considering nonparametric mixture models in various applications, see for instance the 
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discussion on the Old faithfull dataset in lAzzaline and Bowman! ([1990 ), the need for non 



parametric emission distributions in climate state identi fication in lLambert et al.l ([20031) 



or the nonparametric hidden Markov model proposed in Yau et al. (|2011 ). In absence of 



training data, mixture models with nonparametric emission distributions ar e in general not 
identi fiable without additional structural constraints. In a seminal paper, lHall and Zhou 
(2003) discussed identifiability issues in a 2 -component nonparametric mixture model un- 
der repeated measurements (or multivarate) and showed that identifiability essentially only 
occured if there is at least 3 repeated me asurements for each indiy i dual. T his work has been 
extend ed by various authors including iKasahara and Shimotsu ( 2007 ). Bonhomme et al.l 
(|201ll ) a nd references therei n. Identifiability recent results about mixtures may also be 
found in lAllman et al.l (|2009l ). 
Consider location models 

Y i =m Si + e i , ieE (1.1) 

where (Si)i 6 N is an unobserved sequence of random variables with finite state space {1, . . . , fc}, 
(ci)ieN is a sequence of independent identically distributed random variables taking values 



and m; € 
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1, . . . , fc. The aim is to estimate the parameters fc, mi, . . . , m^, the 



distribution of the latent variables (Si)igN and the distribution F of the q's. As usual for 
finite mixtures, one may recover the parameters only up to relabelling, and obviously, F may 
only be estimated up to a translation (that would be reversly reported to the m/s). However 
the identifiability issue is much more serious without further assumptions. To illustrate the 
identifiability issues that arise with such models, assume that the Si's are independent and 
identically distributed. Then the Y^s are independent and have distribution 



k 

£ 

3 = 1 



n(j)F (• - m 3 ) . 



(1.2) 



Here, > 0, j = 1, . . . , k, Ylj=i A* U) — 1; m j £ R, j = 1, . . . , fc, and F is a probability 
distribution on R. An equivalent representation of (11.21) correspon d s for i nstance to fc = 1, 
mi = and F = P m ,f the marginal distribution. iHunter et al. (2004) have considered 
model (|1.2p with the additional assumption tha t F is symmetrical and under some con- 
straints on the m 7 -, in the case of fc < 4 , see also IL. Bordes and Vandekerkhove (200G) and 
Butucea and Vandekerkhove ( 201ll ) in the case where fc = 2 for an estimation procedure 
and asymptotic results. 

In this paper, we investigate model (11. ip where the observed variables are not inde- 
pendent and may be non stationary. Interestingly, contrarywise to the independent case, 
we obtain identifiability without any assumption on F under some very mild conditions 
on the process Si, - ■ ■ ,S n , see Theorem 12.11 To be precise, if Q is the fc x fc-matrix such 
that Qij is the probability that Si = i and S2 = j, we prove that the knowledge of the 
distribution of (Yi, Y2) allows the identification of fc, mi, . . . ,m,k, Q and F as soon as Q 
is a non singular matrix, whatever F may be. Building upon our identifiability result, we 
propose an estimator of fc, and of the parametric part of the distribution, namely Q and 
mi, . . . , TOfc. Here, we do not need the sequence (Xi)i 6 N to be strictly stationary and asymp- 
totic stationarity is enough, then Q is the stationary joint disribution of two consecutive 
latent variables. Moreover, we prove that our estimator is y^n-consistent, with asymptotic 
Gaussian distribution, under mild dependency assumptions, see Theorem 13.11 When the 
number of populations is known and if the translation parameters m,j, j < fc are known to 
be bounded by a given constant, we prove that the estimator (centered and at -y/n-scalc) 
has a subgaussian distribution, see Theorem l3.2l 

In the context of hidden Markov models as considered in lYau et all (1201 lh . we propose 
an estimator of the non parametric part of the distribution, namely F, assuming that 
it is absolutely continuous with resp ect to Lebesgu e measure. This estimator uses the 
model selection approach developped in M assar tl (l2007h . with the penalized estimated pseudo 
likelihood contrast based on marginal densities X^=i A0)/(j/ — TO j)- We prove an oracle 
inequality, see Theorem 14.11 which allows to deduce that our non parametric estimator is 
adaptive over regular classes of densities, see Theorem 14.21 and Corollary [TJ 
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The organization of the paper is the following. In section [5] we present and prove our 
general identifiability theorem. In section [3] we define an estimator of the order and of the 
parametric part, and state the convergence results: asymptotic gaussian distribution and 
deviation inequalities. In section 2J we explain our non parametric estimator of the density 
of F using model selection methods, and state an oracle inequality and adaptive convergence 
results. Most of the proofs are given in the Appendices. 



2 General identifiability result 

Let Qk be the set of probability mass functions on {1, ... , fc} 2 , that is the set of fcx fc matrices 
Q = (Qi,j)i<i,j<k such that for all € {1, . . . , fc} 2 , Q itj > 0, and £) i=1 £)j=i Qij = 1. 
We consider the joint distribution of (Y~i, Y2) under model (jl.ip . which has distribution 

fc 

Pg, F {A xB)=Y, QijF{A - mi)F{B - m 3 ), WL, B £ Br (2.1) 



where Br denotes the Borel a field of R and 9 = (m, (Qi,j)i<i,j<k,(i,j)Mk,k))i with m = 
(mi, . . . , mfe) G R fc . Recall that in this case, ordering the coefficients mi < m2 < • • • < m^ 
and replacing F by F(. — mi) leads to the same model so that without loss of generality we 
fixe = mi < m-2 < • • • < m/j. Let Ok be the set of parameters 9 such that mi = < 7712 < 
. . . < m k and Q £ Qk, where Q = {Qi,j)i<i,j<k, Qk,k = 1 - X^j^o.fc) Qi.j- 
Let also 0° be the set of parameters 9 = (Qi,j)i<i,j<k,(i,j)^(k,k)) G ®fc such that 
mi = < m 2 < . . . < mfc and det(Q) 7^ 0. We then have the following result on the 
identification of F and 9 from P^f- 

Theorem 2.1 Le£ -F and _F be any probability distributions on R. Let fc and fc be positive 
integers. If 9 £ and e 9?, men 

Pe.F = P§ p k = k, 9 = 9 and F = F. 

Remark 1 In the same way, it is possible to identify £-marginals, for any I > 2, that is 
the distribution of (Si, . . . , St), m and F on the basis of the distribution of (Yi, . . . , Yi). 



Remark 2 The independent case consider ed in Hunter et al. ( 200 A ). \L. Bordes and Vandekerkhovt 



\200(\ ) XButucea and Vandekerkhov\ \20l\ ) is a special case where det(Q) = for which our 
identifiability result does not hold. An important class of models is that of hidden Markov 
models. In that case, if Q is the stationary distribution of two consecutive variables of the 
hidden Markov chain, det(Q) ^ if and only if the transition matrix is non singular and 
the stationary distribution gives positive weights to each point. When k = 2, we thus have 
det(Q) =^ if and only if Si and S2 are not independent. 



Proof of Theorem [ 

Denote by <j>F the characteristic function of F, 4> F the characteristic function of F , (f>g_i 
(respectively 4>S 1) the characteristic function of the distribution of mg 1 under Pg t F (respec- 
tively under Pg F ), <f>e,2 (respectively 4>g 2 ) the characteristic function of the distribution 
of ms 2 under Pq^f (respectively under Pg F ), and <&g (respectively the characteristic 
function of the distribution of (ms 1 ,ms 2 ) under Pq.f (respectively under Pg F ). Then since 
the distribution of Yi is the same under Pg t p and Pg F , one gets that for any t £ R, 

<j> F (t) <f>e,i (t) = <l>p (t) 4 §il (t) . (2.2) 

Similarly, for any t £ R, 

{t) 08,2 (t) = (ftp (t) <f>g~ 2 if') ■ (2-3) 
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Since the distribution of (Y\,Y 2 ) is the same under Pg^p and Pg p, one gets that for any 

t = (h,t 2 ) e M 2 , 

0f 0f (h) $0 (t) = P (ti) 0^ (i 2 ) $ fl - (t) . (2.4) 

There exists a neighborhood V of such that for all t eV, <j) F (*) # 0, so that (|2~2"|) . ([23)1 
and (|2.4I) imply that for any t = (t 1 ,t 2 ) € V" 2 , 

$0 (t) 9 - j t/, St2 (t 2 ) = $ s (t) fljl (ti) e , 2 (ta) . (2.5) 

Let t\ be a fixed real number in V. $g (ti,t 2 ), 4>§ 2 (£2), {ti,t 2 ), 4>g. 2 (t 2 ) have analytic 
continuations for all complex numbers z 2 , <&g (ti, z 2 ), 4>g (z 2 ), $2 (t±, z 2 ), 0g (z 2 ) which are 
entire functions so that (|2.5|) holds with z 2 in place of t 2 for all z 2 in the complex plane C and 
any t\ € V. Again, let z 2 be a fixed complex number in C. $0 (ti, z 2 ), 00 1 (^)j ^0 Z2 ), 
00, 1 (ii) have analytic continuations <frg (z±, z 2 ), 0a (zi), $g (zi, z 2 ), 4>e (z±) which are entire 
functions so that (|2.5p holds with z\ in place of t\ and z 2 in place of t 2 for all (zi, z 2 ) e C 2 . 
Let now 2 be the set of zeros of (f>e,i, Z be the set of zeros of <fig and fix z\ € -Z. Then, 
for any z 2 G C, 

$0 («i,«a) fl - 2 (22) = 0. (2.6) 

We now prove that z 2 — > $0 (zi, •) is not the null function. For any z £ C, 



$0 («!,«) =5Z 



Since = mi < m 2 < . . . < rrik, if $0 (21, •) was the null function, we would have for all 
£ = l,...,k 

k 

which is impossible since det(Q) 7^ 0. Thus, $e(zi,-) is an entire function which has 
isolated zeros, 4>g 2 (•) also, and it is possible to choose z 2 in C such that $0 (z\, z 2 ) 7^ and 
4>§ 2 (^2) 7^ 0. Then (|2.6[) leads to 0^ 1 (zi) = 0, so that Z C Z. A symmetric argument 
gives ZcZso that Z — Z. Moreover , 00,i and 0a-, have growth order 1, so that using 



Hadamard's factorization Theorem (see lStein and Shakarch i (2003) Theorem 5.1) one gets 



that there exists a polynomial R of degree < 1 such that for all zeC, 

09,1 (*) = e R W<f> §A (z) . 

But using 00,i (0) = 0a j (0) = 1 we get that there exists a complex number a such that 
^0 1 ( 2 ) = e<lz 0e,i Using now = mi < m 2 < . . . < m^, and = mi < m 2 < . . . < m^ 
we get that 00, 1 = 0g r Similar arguments lead to 00, 2 = 0g 2 - Combining this with (|2.5p 
we obtain $0 = $a which in turns implies that k = k and 9 = 8. Thus, using (|2.2p . for all 
t £ 1 such that 00,i (t) 7^ 0, 0f (i) = <t>p (t). Since 00. 1 has isolated zeros and 0j?, 0^, are 
continuous functions, one gets 0f = 0^, so that -F = P 1 . □ 



3 Estimation of the parametric part 
3.1 Assumptions on the model 

Hereafter, we are given a sequence (Yi)igN of real random variables with distribution P*. Wc 
assume that (jTTTJl holds, with (Si)i£N a sequence of non-observed random variables taking 
values in {1, . . . , k*}. We denote by F* the common probability distribution of the e^'s, and 
m* <G K fe the possible values of the ms/s. We assume: 

(Al) (Si, Sj+i) converges in distribution to Q* £ Qk* ■ 

For 6* = (m*, (Q*,j)(i,i)^(fc*,fc*)), € 9°*, and all differences m* j —m*,i,j = \,...,k*, 
i =/= j, are distinct. 



4 



We do not assume that k* is known, so that the aim is to estimate 9* and fc* altogether. 
Assumption (Al) implies that the marginal distributions in Q* are identical so that we 
write from now on <j)g* = 4>0*,i — 4>6* ,2- 

The idea to estimate 9* and k* is to use equation (|2.5[) which holds if and only if the 
parameters are equal. Consider w any probability density on R 2 with compact support 5, 
positive on S and with belonging to the interior of S ; typically S = [—a, a] 2 for some 
positive a. Define, for any integer k and 9 € Qk'. 

M{9)= [ (ti, t 2 ) 0e,i (*i) <fo.2 (*2)-*e(ti,t 2 ) <£e* (tjfo* (t 2 )| 2 
Jm. 2 

\4>F* (h)<t>F* (< 2 )| 2 w(ti,t 2 )dtidi 2 . (3.1) 

We shall use M{9) as a contrast function. Indeed, thanks to Theorem 1 2. 1[ 9 E 9° is such 
that M{9) = if and only if k = fc* and 9 = 9*. 
We estimate M(-) by 

M„(0) = / §„ (*i,ta) 0e ,i 2 (<a) - $0 (*i,*2)?n,i (*i)0n,2 (*a) 2 w {t u t 2 ) dtxdt-z, 



(3.2) 

where $„ is an estimator of the characteristic function of the asymptotic distribution of 
(lt,Yt+i), <p n .i(t) = <!>„(£, 0) and 0„, 2 (i) = <I>„(0,i). One may take for instance the empirical 
estimator 



^ n— 1 

$n (tl,*a) = - eX P * + f 2^ + l) • (3-3) 



We require that <3?„ is uniformly upper bounded; if is defined by (|3.3p then it is uniformly 
upper bounded by 1. Define, for any t = (tijta) € K 2 



Z n (t) = ($„(t) - $ e .(t)&r* (t X ) (fe) 



Our main assumptions on the model and on the estimator <£>„ are the following. 

(A2) The process (Z n (i))tes converges weakly to a Gaussian process (Z(t))tes in the 
set of complex continuous functions on S endowed with the uniform norm and with 
covariance kernel T(-, •). 

(A3) There exist real numbers E and c (depending on 9*) such that for all x > and 

n > 1, 

P* ^sup |Z„ (t) | > £ + x\ < exp (-ex 2 ) . 

(A2) will be used to obtain the asymptotic distribution of the estimator, and (A3) to 
obtain non asymptotic deviation inequalities. Note that (A2) and (A3) are for instance 
verified if we use p. 31), under stat i onarit y and mixing condition s on the Yj's. T his follows 
applying results of IDoukhan et all (|l994 . iDoukhan etaD (|l995t ) and[Rk] (|2000h . 

3.2 Definition of the estimator 

Our contrast function verifies M (9) = if and only if 9 = 9* only when we restrict 9 
to belong to UfegN©fc- When minimization is performed over UfcgN©fc it may happen that 
the minimizer is on the boundary. To get rid of this problem, we build our estimator 9 n 
using a preliminary consistent estimator 9 n , and then restrict the minimization using the 
information given by 9 n . 

Define for any integer fc, Ik a positive continuous function on 0° and tending to +oo on the 
boundary of 0° or whenever ||m|| tends to infinity. For instance one may take 



h ( m , {Qi,j)(ij)^(k,k)) = -logdetQ - ^ log 



|mj - nij-il 

(1 + Hloo) 2 " 
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Let (k n , 9 n ) be a minimizer over {(k, 9) : k S N, 9 € Ofc} of 

C„ (fc, 0) - M n (9) + A n [J (*) + J fe (0)] 

where J : N — > N is an increasing function tending to infinity at infinity and (A n )„ e N a 
decreasing sequence of real numbers tending to at infinity such that 

lim \fn\ n = +oo (3-4) 
n— >-\-oo 

Define now 9 n as a minimizer of M n over 

G 6 fe „ : I kn (9) < 2I kn (§ n ) } . 

In case k* is known, we may choose another estimator. Let K, be a compact subset of 
8°*. We denote by 9 n (1C) a minimizer of M n over JC. This estimator will also be used as a 
theoretical trick in the proof of the asymptotic distribution of 8 n . 

3.3 Asymptotic results 

Our first result gives the asymptotic distribution of 9 n . To define the asymptotic variance, 
we define VM (8) the gradient of M at point 9 and D 2 M (9) the Hessian of M at point 9. 
We also set V the variance of the gaussian process 



{C (t) [Z (-t) (-t x ) e . (-ta) - (-t) (Z(-ti,0)^ (-t 2 ) + Z(0, -ta)08* (-t x ))] 

+C(-t) [Z(t)^ fl * (ti)^ (t 2 )-* fl * (t)(Z(ti,0)^* (h) + Z(0,h)cf>e* (h))]}w(t)dt 
where 

C (t) = * e . (t) V (</> e * (h) fo* (* 2 )) - V$ e . (t) e * (t x ) (ta) . 

Theorem 3.1 Assume (Al), (A2), and \3.4\) - Then D 2 M (9*) is non singular, and for 
any compact subset K, o/OjL such that 9* lies in the interior ofIC, y/n(9 n (IC) — 9*) converges 
in distribution to the centered Gaussian with variance 

E = D 2 M (9+y 1 VD 2 M (6»*) _1 . 

Moreover, \pri(9 n — 9*) converges in distribution to the centered Gaussian with variance E. 

If one wants to use Theorem 13.11 to build confidence sets, one needs to have a consis- 
tent estimator of E. Since D 2 M is a continuous functions of 9, D 2 M (&nj is a consistent 

estimator of D 2 M (9*). Also, V may be viewed as a continuous function of L(-, •) and 9, as 
easy but tedious computations show. One may use empirical estimators of L(-, •) which are 
uniformly consistent under stationarity and mixing conditions, to get a consistent estimator 
of V. This leads to a plug-in consistent estimator of E. 

Another possible way t o estimate E is to use a boostrap method, following for instance 
Clemencon et all (2009) when the hidden variables form a Markov chain. 



When we have deviation inequalities for the process Z n , we are able to provide deviation 
inequalities for y/n(8 n (lC) — 9*). Such inequalities have interest by themselves, they will also 
be used for proving adaptivity of our non parametric estimator in Section 2] 

Theorem 3.2 Assume (Al) and (A3). Let K, be a compact subset o/0$L such that 9* lies 
in the interior of K,. Then there exist real numbers c* , M* , and an integer n* such that for 
all n>n* and M > M* , 

P* (Vn\\6 n (K.) - 0*|| > Af) < 8exp (-c*M 2 ) . 

In particular, for any integer p, 



sup Ef 

n>l 



< 
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4 Estimation of the non parametric part in the case of 
hidden Markov models 



In this section we assume that P* is the distribution of a stationary ergodic hidden Markov 
model (HMM for short), that is the sequence (St)tgN is a stationary ergodic Markov chain. 
We also assume that the unknown distribution F* has density /* with respect to Lebesgue 
measure. Thus the density s* of Y\ writes 

fe* 

s* (») = $>* (j)f* (v-m*), 

where fi*(j) = J2i=\ Qj,n 1 < i < k* . We shall assume moreover: 
(A4) For all i, j = 1, . . . , k* , Q* ■ > 0, and there exists 6 > such that 



[/* (y)] 1 - 6 dy < +™. 



Notice that, if the observations form a stationary HMM and if for all i,j = 1, . . . , fe*, 
Q* j > 0, then the se quence is geom etrically unifor mly er godic, and applying results of 
Doukhan etaD (|l994h . lDoukhan et all (|l995h andHk] <|2000h . (A2) and (A3) hold if we use 
(1331). 



We propose to use model selection methods to estimate /* using penalized marginal 
likelihood. We assume in this section that k* is known, and that we are given an estimator 
6 n = ((rhi)i<i<k* , (Qi,j)(i,j)jt(k* ,k*)) — n {1C) of 6* for some compact subset K. of such 
that 9* lies in the interior of K. Let Jt(i) — Y2j=i Qi,ji Define for any density 

function / on 

r fe* 



(/) = ^E lQ g 



i=l 



^^(j)/W-%) 
3=1 



Let T be the set of probability densities on R. We shall use the model collection (J r p ) p >2 of 
Gaussian mixtures with p components as approximation of J- . Let us define for any integer 
P 



(4.1) 

where -B and A p , 6 p , p > 2, are positive real numbers, and where tpp is the Gaussian density 
with variance (3 2 given by <pp{x) = exp(— x 2 /2/3 2 )/ (3V2tt. For any p > 2, let / p be the 
maximizcr of £ n {f) over T v . Define 



D n (p) = -£ n (/pj + pen (p, n) . 
Our model selection estimator / will be given by fp whenever p is a minimizer of D„ 



4.1 Oracle inequality 

The following theorem says that a suitable choice of the penalty term pen (p, n) leads to an 
estimator having good non asymptotic and asymptotic properties. In the following, 

fe* 

3 = 1 
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is the estimator of s*. 



S* 



{£m* C7)/0 



,/6 7 p } 



for any p > 2, h 2 (-,-) is the Hellinger distance and K(-,-) the Kullback-Leibler diver- 
gence between probability densities. For any p > 1, fix some / p 



ELi A 4 * (i) / P 



6 J-p and set s p 



Of course to derive good behaviour of the estimator from the 



oracle inequality, one will have to choose carefully f p . 



Theorem 4.1 Assume (Al), (A3) and (A4). Let {x p ) p >2 be a sequence of positive real 
numbers such that S = Ep>2 e ~ Xp < +oo. Then there exist positive real numbers n and C , 
depending only on Q* and S such that, as soon as 



pen (p, n) > — I k*p 
n 



!■»;;/; -!■ ]<>» [ y ] + lo;;.!,, 



u -4 



one /ias 



with 



Br* [h 2 (s*,s p )] <c{ inf (if (/*, /„) + pen (p, n) + + 

(p>2 



E L W)/p(^i - 





The proof of Theorem 14.11 is postponed to Appendix [C] 

Notice that the constant in the so-called oracle inequality depends on P*, so that the result 
of Theorem 14. II is not of real practical use. Also, the upper bound depends on 6, for which 
the results in Section [3] are for large enough n. However, Theorem 14. II is the building stone 
to understand how to choose a penalty function and to prove adaptivity of our estimator. 



4.2 Adaptive estimation 

We prove now that 'sp is an adaptive estimator of s*, and that, if maxj/i*(j) > 4, fp is 
an adaptive estimator of /*. Adaptivity will be proved on the following classes of regular 
densities. 

Let 2/0 > 0, c > 0, M > 0, r > 0, C > 0, A > and L a positive polynomial function on 
R. Let also (3 > and 7 > (3/2 — j3) + . If we denote V — (yo, Co, M, r, C, A, L), we define 
Hioc(P, r Y,'P) as the set of probability densities / on R satisfying: 

• / is monotone on (—00, — yo) and on (yo, +00), and infi f(y) > cq > 0. 

Vy G R, f{y) < Me" T|vl (4.2) 

• log / is |_/3J times continuously differentiable with derivatives lj, j < ft satisfying for 
all x € R and all \y — x\ < A, 

\e m (y)-e w (x)\<lP\\L(x)\y-xf-W 

and 

/ \eM\ 2J ^f(v)dy<c. 

Jr 

We use sp where the penalty is set to 

pen (p, n) = — (k*p + x p ) log n. 
n 
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Theorem 4.2 Assume (Al), (A3) and (A4). Then for any V, /3 > 1/2 and 7 > (3/2 
/3) + , £/iere exists C(/3, 7, P) > smc/i that 



lim sup 

n— >+oo 



(logn) E 



23 

23+1 



sup [/» 2 (^,%)] < (703,7,7?). 



Thus, % is adaptive on the regula rity /3 of the density classes up to (\ogn) 3 ^^ 2l3+1 \ see 



Maugis-Rabusseau and Michell ( 20121 ) for a lower bound of the asymptotic minimax risk in 



the case of independent and identically distributed random variables. Using Theorem 14.21 
we can also derive adaptive asymptotic rates for the minimax Li-risk for the estimation of 
/*■ 

Corollary 1 Assume (Al), (A3), (A4) and that max,,- fJ,*(j) > \ . Then for any V, 
(3 > 1/2 and 7 > (3/2-/3)+, 



lim sup 

n— >+oo 



(log nf 



p 

2.B+1 



sup Ep* 

/*eK (oc (/3,7,-P) 



fp-r 



< 



2y/C(l3,7,V) 
(2max,/z* (j) - 1)' 



It is possible that the constraint, maxj fj,*(j) > 1/2 is not sharp, however note that the 
Fourier transform of s* is expressed as 4>e*4>f* with <j>g*(t) — Ylj=i t i *(j) eitmj an d 4>f* the 
Fourier transform of /*, and t hat \4>0* ( t)\ > for all t € K if and only if max., > 1/2, 

applying the main theorem of Moreno! ( 19731) . 
Proof of Corollary [7] 

We shall use 

||s*-%||i <2h(s*,sp), 



together with 



ii = 11 £>*0')/*G-K)-i>c?) /?(•-%) Hi 

> II 5>* (J) (7p -/*)(■-%) 111 - IK -0*11 

-Ii5>* (j)(r(--K)-r (•-%)) id 



> I2max/z*(j)-ll 

-lir(--mj) -/*(•-%) Hi 

which follows by using iteratively the triangle inequality. Using /3 > 1/2, Theorem 
Theorem 14.21 we thus get that 



and 



lim sup 



(log nf 



p 

■2,-1+1 



sup Ef 



fp-r 



< 



2y/C(P,J,P) 

(2max 7 /i* (j) - 1) 



as soon as 



lim 



p 

2/3 + 1 



sup 



[||/* (._ m *) _/*(._ ^011 



ri^+oo ^ (fog n) 3 J f*eU,o40^,V) ' 

Now, since /* e 7J ioc (/3, 7, 7>) with /3 > 1/2, if |% - m*| < A, 

I log/*(y - fhj) - log f*(y - m*)\ < L(y - m*)|% - m*f M . 



(4.3) 
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Set M > 5^, and a > such that, if \y\ < n a , then L(y)\fhj - m*|^ A1 < 1. Observe 

also that since 9 n stays in a compact set, for large enough n, if \y\ > n a , then for any j, 
|?/ — mjl > n a /2 and |?/ — m*| > n a /2. We obtain, using |e u — 1| < 2u for < u < 1: 



|r(--m*) -/*(•-%) 



jV in 



< 2 



A/ log 71 



|y|>« a /2 



-((3Al)/2 



L{y- m*)f*(y - m*)dy 
f*(y)dy + ^ne*-^ || >vjw log n/vW 



and (H31) follows from Theorem EHl /3 > 1/2 and the fact that /* € Hi oc {fi^,V) has 
exponentially decreasing tails. □ 

4.3 Computation of / p 

The computation of / p may be performed using the EM-algorithm, which is particularly 

simple for Gaussian mixtures. Indeed, for / = Ei=i ^iVft (' — a «)i Ej=i V- (J) f (' ~ ™j) i s 
a mixture of pk* Gaussian densities ^>p i (• — on — fhj) with weights TTifl (j). Starting from an 
initial point ((7if )i<i< p , (v °)i<i< p , (of )i<»<p)i the EM Z-th iteration may be easily computed 
as 



Ei'=i Ej=i EILi /* C?) 4^, ( r * ~ % ~ a '0 
Ej=i ELi ( y t - %) m CO Ti^j (*t - % - a i) 



i = l,...,p, 



Ej=iEr=iAi(i)^^ 



,p, 



where for any real numbers C\ , C% , Tc 1 ,c 2 i s the troncature function: Tq 1 ,c 2 ( x ) — x ^Ci <x<c 2 ' 
Ci]l x <c 1 + C 2 ~H x> c 21 an d 



J+i 



Ti 



b n ,B 



EjU ELi ( y « - ™j - v l) 2 ft (i) ^y/jj ( y « - % - a D 



i,...,p. 
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A Proof of Theorem 3.1 

First of all, we prove a lemma we shall use several times. Using | |A| 2 — |B| 2 | < \A — B\ 
\B\\ and the fact that characteristic functions are uniformly upper bounded by 1, we get 
that for any integer k and any 9 £ &k- 



\M n {6)-M{6)\<2 J | * T ,(*i,t a )-4fl* (ti,t 2 )<f> F * (ti)0jr* (i a ) 

+ 4>n (*i) 4>n (h) - 4>e* (*i) 4> e * (t 2 ) 4>f* (*i) 4>f* (h) } w (h,t 2 ) dtidt 2 . 

The upper bound does not depend on k and 9, $„ is uniformly upper bounded, and we get 

Z n (t) 



sup \M n (9) - M (0)| = O sup 
fc>2, 0ee k \tes 



= Qp*(l/Vn) 



(A.l) 



which together with Theorem 12.11 gives 
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Lemma 1 If (k n ,9 n ) n , 9 n £ Ofe„, is a random sequence such that there exists an integer 
K > k* , and a compact subset T of Uk<K^1 such that 

P* (k n < K and 0„ £T)^1 and M n (0 n ) = op*(l), 

then 

P* (Jfe„ = fe*) ->■ 1 and 6»„ = 0* + o P * (1). 
Since C n {k ni n ^ < C n (fc*, 0*) and Af„ is a non negative function, we get 

+ 4„ < [J(fc*) + 4* (**)] + M "^~ M ^ , 

L \ /J A„ 

so that using (|A. 1|) . assumption (A2) and (|3.4[) we get 



Also, 



so that 



+ h n (On)] < [J(k*) + I k * (9*)} + or* (1) . (A.2) 
M n (fl n ) < M n (6*) + X„ [J{k*) + I k * (6*)] , 



M n (0„) = o v * (1) . 

Thus, using (IA.2I) and Lemma Q] 

P* (jfe n = k*) -> 1 and n = 6»* + o P *(l). (A.3) 

Set now JC = {9 £ O k * : I k * (9) < AI k * (9*)}. JC is a compact subset of 6$L. Let be 
the event (k n — k* and 9 n — 9 n (JC)). Using Lemma [TJ we get that 9 n (JC) is a consistent 
estimator of 9* , and using (|A.3|) and Lemma [I] we get also that 9 n is a consistent estimator 
of 0*, so M n has the same minimizer on JC and on {I kn {9) < 2/fc ra (0„)}, with probability 
tending to 1, since it belongs to a neigbourhood of 9*. Thus, P* (E n ) — > 1. Now, since 



(AC)u B „ + 0„u f 



Theorem 13.11 follows as soon as we prove that y/n(9 n (JC) — 9*) converges in distribution to 
the centered Gaussian with variance E. But this is a straighforward consequence of 

D 2 M n (0 n ) (0 n (/C) - 9*) = VM„ (9*) , 

for some 9 n £ Ok* such that \\6 n — 9*\\ < \\9 n (JC) — 6*\\, the consistency of 9 n {JC) and the 
following Lemma 

Lemma 2 Assume (Al) and (A2). Then 

• y/n\J M n (9*) converges in distribution to a centered gaussian with variance V . 

• D 2 M (9*) is non singular, and for any random variable 9 n £ Ok* converging in P*- 
probability to 9* , one has 

D 2 M n (0„) = D 2 M (9*) + o P * (1) . 

Proof of Lemma [H 

First notice that, in every formula, taking the conjugate of any involved function at point 
t is the same as taking the function at point — t. This is also verified for derivatives. Write 
now for any 9 £ Ok* and any t = (ti,t 2 ) 

G n {0, t) = $„ (t) ct> e ,i (h) <t> e , 2 (*a) - ®e (t) ?„,i {h) $ n , 2 (t 2 ) 



11 



so that, if VG„ (0,t) denotes the gradient of G„ with respect to 9 at point (9, t), one has 
VM„ (0*) = /" [VG„ (9* , t) G„ (0* , -t) + VG„ (0* , -t) G n (9*,t)} w (t) dt. 

Now, writing $„ (t) = + <&g* (t)(f>F* {t\)4>F* (£2) an d using (A2) one gets easily 

V^VAf„ (0*) = J {0 F * (tx)^* (t 2 ) (t) V (h) ^ (t 2 )) - V$ e * (t) ^ (t!) (t 2 )] 

(-t) 4>e* Hi) <^e* (-t 2 ) - $0* (-t) (Z n (-ii,O)0 o * (-t 2 ) + Z n (0, -t 2 )<j>e* (-ii))] 
+</> F * (-ti)^F* (-*2) (-t) V Hi) 4> e * (-i 2 )) - V$ e * (-t) </> e * (-ti) e * (-t 2 )] 
[^ n (t) 0e* (ti) (t 2 ) ~ $0* (t) (Z n (ti, 0)^* (t 2 ) + Z„(0, i 2 )0j* (ti))]} tu (t) dt 

1 



+ O v * 



and the convergence in distribution of ^/nVM n (9*) to a centered gaussian with variance V 
follows. 

Similar computation gives that for any 9 € 9fc* 
D 2 M n (9) - D 2 M n (9*) = J \$ n (t)\ 2 [A 1 (t,9)-A 1 (t,9*)]w(t)dt 

|$„(ii,0)| 2 |$ n (0, t 2 )\ 2 [A 2 (t, 9) - A 2 (t, 9*)} w(t)dt 
+ Re ( [ $ n (-t)$ n (ti, 0)$ n (0, t 2 ) [A 3 (t, 9) - A 3 (t, 9*)} w(t)dt 



for matrix-valued functions Ai(t,0), A 2 (t,9), A 3 (t,9) that are, in a neighborhood of 9*, 
continuous in the variable 9 for all t and uniformly upper bounded. Thus D 2 M n (9 n ) — 
D 2 M n {9*) converges in P*-probability to whenever 9 n is a random variable converging in 
P*-probability to 9*. 

Finally, note that at point 9* the Hessian of M simplifies into: 

D 2 M{9*) = 2 J H(t)H(-tf \cj) F * (h) F * (t 2 )\ 2 w(t)dt, 

with 

H(t) = $ e * (t)(^j* (ti)V0 o * (t 2 ) + V<£ 9 . (h)(j)e* (t 2 )) - V$ * (t)fo* (tjfo (i 2 ). 

Denote by H mj (t), j = 2,...,fc*, #Q 31 , 32 (t), ii,i 2 = 1,. ..,*:*, (ji,j 2 ) ^ (k*,k*) the 
components of the vector H(t). Definite positiveness of the second derivative of M at 9* 
can thus be established by proving that, if for all t £ S, 

k 

^U mj H mj {t)+ £ U jld2 H QjiJa (t)=0 (A.4) 

then 

U mj =0, j = 2,--- ,k\ U jld2 =0, jt,j 2 = l,...,k\ {ji,j 2 )^{k*,k*). 

By linear independence of the functions e lta and te ltb this implies in particular that for all 
t= (ti,t 2 ), 

fe* 

£ U mj y (ji) (j 2 ) Q*, . 4 e itl K 1 +K 3 )+^K 2 +^4) 

jir- iJ4=l 

fe* 

= E ^ 1 ^(i2)^(i3)Q*, j4 e itl( <+^ 3 )+^K 2 +<) (A.5) 
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with U mi = 0. The smallest possible term m*? + m* 3 with j\ > 1 is equal to m 2 = m 2 + m\ 
setting ji = 2 and js = 1 only. Thus (IA.5I) implies that 

U m2 ^(2) £ ^(j2)QUe Jt2(m -+K 4 ) =[ / m2 ^(i) £ ^{j2)Ql H e^ +m V 

32,34 = 1 32, j 4 = 1 

for all i 2 , i-e. 

fc* k* 

34=1 34=1 

Since has only isolated zeros this is satisfied if and only if 

fc* k* 

U m2 n* (2) J2 Qh 4 e^ = U m2l x* (1) E Q2,, 4 e It2m - • 

34=1 J 4 = l 

Thus (fA~5j) is satisfied only if either U m2 = or /i* (2) Q\ } = fx* (1) for all j. The latter 
is impossible since Q* is non singular, thus U m2 = and (|A.5[) becomes 

fc* 

E C^A** (ii) (J 2 ) Q*^ .^K^H^l^) 

Jl=3Ja,— ,34=1 

fe* 

E C?2) (is) Q* .e^KH)+ i( 'KH) 

31=3,32,— ,34=1 

The smallest possible value for + m* 3 is then which is obtained with the only 
configuration j\ = 3, j'3 = 1. The same argument as before leads to U ma = 0. Iteration of 
the argument leads to U m . = for all j = 1, • • • , fe*. We now study the derivatives associated 
to Q. We write U the k* x fc*-matrix whose components are Uj lt j 2 for (j'1,,7'2) 7^ (fe*,fc*) 
and C4*, fc * =-E(j 1 ,j 2 )^(fe*,fe*) t/jija- Then 

E ^iaVg^cMt) - ^(ii)^t/TX(t 2 ) 

(3l,32)#(fe*,fc*) 

where for any i e R, V(t) = ((e itm ^ =v .. ife *) T , and 

(3l,32)#(fe*,fc*) 

with 11 = (1, • • • , 1) T G R fc *, since (f>e*(h) = V(ti) T Q*ll and $ e *(t) = y(ii) T <9*^(i 2 ). We 
can then express (|A.4I) as 

V{hf [Q*v{t 2 )v(t 2 ) T um T {Q*) T + Q*y(t 2 )y(< 2 ) T Q i M T c/ T 

-C/V(t 2 )y(t 2 ) T Qllll T (Q*) T ] V(h) = 0. (A.6) 

Note also that since all differences — m^ 2 , ji 7^ j 2 , are distinct, if A is a fc* x fc*-matrix 
and I is an open subset of R, 

[Vt € 1, V{t) T AV{t) = 0] => A + A T = 0. (A.7) 

Then (|A.6|) implies 

Q*V^ 2 )l/(t 2 ) T [/llll T (Q*) T + Q*m T U T V{t 2 )V{t 2 ) T {Q*) T 

+ Q*V{t 2 )V(t 2 ) T Q*m T U T + Um T (Q*) T V(t 2 )V(t 2 ) T {Q*f 

- UV{t 2 )V{t 2 ) T Q*m T {Q*) T - Q*m T {Q*) T V{t 2 )V{t 2 ) T U T = 0. (A.8) 
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Recall also that T1 T UH = and that Q*1l = fi*. Note that U11 = a^* with a G K if and 
only if a = since 11 T [/U = while U T /x* = 1. Therefore if Uli ^ there exists u> € M fc * 
such that u> T (?711) 7^ while (n*) T w = 0. Multiplying the above equality on the left by w T 
and on the right by w leads to 

w T Q*V(t 2 )V(t 2 ) T (//)([/ ll) T w = 

that for all t 2 in an open set. Using (IA.7I) again and since (UTl) T w 7^ we get that 

fj,*[(Q*) T w} T + [(Q*) T w](n*f = 0. 

Since > for all j this implies that (Q*) T w — which is impossible since Q* has full 

rank. Therefore 1711 = and (EB becomes V(t 2 ) T [S[UV(t 2 )(fi*) T + [i*V(t 2 ) T U T ] = 0, 
that is UV(t 2 )(n*) T + fi*V(t 2 ) T U T = for all t 2 in an open set. Multiplying on the left by 
11 implies that UV(t 2 ) — for all t 2 in an open set so that U = 0. □ 

B Proof of Theorem 3.2 

Define for any G 6 fc *, L„(0) = M„(0) - M(0). Then, since M„(0 n (/C)) < M„(0*), one 
easily gets 

M (S n (K)) - M (6*) < \L n (0 n (/C)) - L n (0*)| . 
Define for any t = (t\, t 2 ) and any 

G(0,t) = (t) 0fl,l ^J,2 (t 2 )-*fl (t)^e M (ti)^, 2 (*2)}<^F* (ii)0F* (t 2 ) 

and 

B n (0, t) = F * (t x ) F * (ta) ( (ti) <^, 2 (f 2 ) 



Z n (h,Q) A Z n (0,t 2 ) ^ Z n (t u 0)Z n (0,t 2 ) 
7= — 00,2 (raj H 7= — 9>0,i (ei) H 



-*« (t) 

Writing $„ (t) = ^jjl + $0*^)^(^)^(42) one gets 

L„ (0) - J ([B n (0, t) + G (0, t)] [B„ (0, -t) + G (0, -t)] - |G (0, t) | 2 ) w (t) dt. 
Since G (0*, t) = for all t we obtain 

L n (0) - L n (0*) = J { \B n (0, t) | 2 - \B n (0*, t) | 2 + B„ (0, t) G (0, -t) 

+B n (0,-t)G(0,t)}w(t)dt 

which gives 



|£„(0)-L n (0*)| < / {\B n (6,t) - B n (6*,t)\\B n (6,t) + B n (d*,t)\ 

+2 \B n (0, t)| |G (0, t) - G (0*, t)|} w (t) rft 

which leads to 

M (0 n (/C)) - M(0*) < GW„||0 n (/C) - 0*|| (B.l) 
for some constant G and any integer n, and with 

r k y 2 y 3 y 4 1 
w n = \ + — + + -|- , y„ = sup \z n (t)| . 

tes 
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Observe now that, since D2M is continuous and D2M(9*) is non singular, there exists A > 
and a > such that, if \\6 - 0*|| < a, then M(9) - M(9*) > f ||6» - 6>*|| 2 . Moreover, there 
exists 6 > such that, if 9 € K, is such that \\9 - 9*\\ > a, then M(9) - M(9*) > 8. Using 
(jB.ll) we obtain that for any real number M large enough, 

p* (V^p n (/c) - 0*11 > m) < p* (V„ > 2CA * (iC) ) + r (ynw n > 

where M(JC) = swp s&K \\9\\. This last equation together with Assumption (A3) gives the 
Theorem. 



C Proof of Theorem 4.1 



The proof follows the general methodology for model selection developed bv lMassart ( 2007h . 
To prove Theorem 4.1 and Theorem 4.2, we will use a concentration inequality we state now. 
Let us introduce some notations. For any real function /, denote 



1 - 

Tat 51 



; = 1 



f(Yi)- / fdF* 



Lemma 3 Assume (A4). Let J- be a class of real functions, and F such that, for any 
f G T , |/| < F . Assume that there exists c{F) > and C{F) > such that Vj = 1, . . . , k* , 
|ffC/)| < C(F) where g is defined by 

g{j)=hxE ¥ * {exp [20(F)- 1 \F(Y 2 )\] \S 1 =j}. 

Then there exist universal constants C\, C'2, K\, K2 and a constant C* depending only on 
Q* such that 



¥* ^/n sup G n f > K^Ep* sup G n f + C x t^ + C 2 C*c{F)C(F)x 

\ J J 

< K2 exp {—a;} 

where r 2 = supy eJ r E P * f 2 (Yi). 
Proof of Lemma [3] 

The lemma is an application of Theorem 7 in Adamczak and Bednor d ()2012 ) to the 



stationary Mar kov chain (Xj)j>i = (Si, Yi)i>i and functions f(s,y) :— f(y). Then, with 
the notations of Adamczak and Bednord (|2012h we get that: 



• m = 1, 

• the small set C is the whole space, 

• the minorizing probability measure v is that of (Si, ii)j>i with (Si)i i.i.d. with uniform 
distribution, and S — min^- Q* j- 

• Since C is the whole space, the return times o~(i) = i, so that s-i(f) = f(Yi), thus the 
a 2 of Theorem 7 is just sup y E v * (f(Y 1 )), 

Using the specific assumption of the lemma, taking a = 1, we can apply Corollary 1 of 
Adamczak and Bednord ( 2012 ) , to get (with their notations again) 



a,b,c<C*c(F)C(F) 
for some constant C* > depending only on mimj Q* j- □ 
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For any p>2, define 



k* k* 

S P = { 5> U) f (• - mj) , / G T v , \ mj \ < M(/C), £ /x (i) = 1, 

a*(j)>o, ./ i /."I . 



so that ~s p £ S p . We now fix, for any p > 2, some s p G Sp such that: 



Vt G 5 P , 

For any p > 2 and any cr > 0, define 



< 2||Vs* - V^|| 



(CI) 



sup ( 

tes p ,||\/t— •v/ip||2<c r 



In 



and let L p be an enveloppe function of {In (s* + 1) — In (s* + s p ) , £ G Assume there 

exists functions ip p such that ip p (x)/x is non increasing and for all p > 2 and a > 0, 



£¥* [Vt^p (<t)] < V P (cr) ■ 
Define (T p (depending also on n) as the unique solution of 

ip p (cr p ) = yfno*. 



(C.2) 



(C.3) 



Now we follow and adapt the proof of Theorem 7.11 in iMassartl (120071). Let v be su ch that 
K(s*, s p ) < +oo. If p' is such that D(p') < D(p), then one gets, as in lMassartl (|2007l) p. 241, 



if j* 



2V^ 



< K (s*,s p ) + pen (p,n) 



in In 



2s* 



-pen(p',n) + V p (C.4) 



where 



n h VE'.,m*(j)/p(« -•«,*), 



Applying Lemma 4.23 in IMassartl (|2007fl p. 139, for any positive y p i: 



E* 



sup 



ln(s* + j) - In (a* + s p >) 



WVt- 



V \\2 



< 4 



(y P 'i 



o, and Lemma 7.26 



Using Le mma El the fact that 2y p >\\\/t — yf§p~/\\2 < yi, + ||v£ — \f&p- 

p. 276 in IMassartl (|2007l ). we obtain that for some constant C > 0, except on a set with 
probability less than K% exp — (x p i + x), for all a; > 0: 



In (s* +%') -In (s* + s p >) \ Cte ( ipp* (ypi) ( T{Lp>){xpi + x) | jx p i+: 



Vp' 



Vv' 



ny p > 



Here, T ( L v i) = C2C*c(Lp>)C(L p i). Using again Lemma [3] and Lemma 7.26 p. 276 in 
MassarlJ ( 2007t) we get that, for some constant C > 0, except on a set with probability less 



than Ki exp — (x p > + x), for all x > 0: 



1 G„ (In (s* + V) - In (2s*)) C_ / T(L p ,){x p , + x) 



V P ' 



ny P > 
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Now, using (|C.1|) . we get 



< 



s* + Ws* 



< 6\\Vs* 



and 



s*-^\\ 2 < 211 VF- \/^\\ 2 



and we finally obtain that, for some other constant C > depending only on P*, except on 
a set with probability less than 2K2 exp — {x p > + x), for all x > 0: 



ln( a * + y) -ln(2s*) \ < _C f ifojjfr) T(L p ,)(x p , + x) 



y P > \ y P > 



ny P > 



Define for some constant a to be chosen 



(ay +x){l + T{L p ,)) 



Then we can follow the proof of Theorem 7.11 in iMassarti (|2OO70 to obtain that, as soon as 

pen (p,n) > «fo; + - Ei — P J , (C.5) 

one has for any n > 2, for some real numbers k > and C > depending only on Q* 

E P > \h 2 (s*,s p )] <c(inf (K(s*,s p )+ pen (p,n) + E P . [V p }) + -\. 

But using the convexity of the Kullback-Leibler divergence to both arguments, we have, for 
any P> K (s*, s p ) < K (/*, f p ). Thus to finish the proof of Theorem 14. 1[ one has to find 
functions tpp verifying (|C.2|) . evaluate <r p using (|C.3p . and evaluate T(L p ). 
Let us first prove that there exists constants C, C > depending only on S and Q* such 
that, as soon as (A4) holds, for any p > 2, 



rd,/) -. ■: i 1 + ^ 



First of all, we see that we can take 

with c(L p ) = 2/(5, the function defined in Lemma [3] is given by 

k* 



(C.6) 



9 (s) = log 



1 



1 



b p \/2iTs*{u) 



f*(u — m*)du 



Under (A4), on gets that there exists constants C > depending only in Q* and S such that 
g is bounded by the constant Cln ^1 + p-J and (IC6|) follows (for maybe another constant 
C). 



To find functions ^ p, we shall use Doukhan et al.l ([1995). Since (Yt)tgN is geometrically 
ergodic, Lemma 2 in lDoukhan et al" I (ll995h . implies that, for some constant C that depends 
only on Q*, for any real function /, 



ll/li; < C 7 (/)(l + log + ( 7 (/)), 7(/) = / f (1 + log + |/|)dP* 
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where ||-|L is defined in iDoukhan et al.l (|1995T) . Now, since for all x > 0, ccln x < x /e, 



IS *T 



\f\*dF* ( l + log+(^ / |/| 3 dP*) 



Using Lemma 7.26 in iMassartl (I2007T ). we thus get that for all t G S p , 

\\ln(s* +t) - In (s* + s p )f < ^\\Vt- v^lla 
for some constant c > that depends only on Q*, and the same trick leads to 
Hp (u, {In (** + 1) - In (s* + s p ) , t e S p }) < H 2 (cu, {y/i, t € S P } 



where ifg (u, F) is the bracketing entropy of a set J 7 at level u with respect to |j • ||^, that is 
the logarithm of the minimum of the number of brackets of |j • |j^-width u needed to cover 
J 7 , and H2 (it, J-) is the bracketing entropy of a set T at level u with respect to || • H2. 
Let for any for a > and p > 2 



7p <» 



y \j H 2 (cu, {y/t, t G Sp}^jdu. 



Using Theorem 3 in lDoukhan et al.l (|1995I ) we get 



-Bp* [W p (a)} < A Vp (a) 



1 g p (lAe(a,n)) 



(C.7) 



where e(<7, n) is the unique solution of x 2 /B(x) — r] 2 (a)/na 2 , 

B{x) = x + C(x — xhix) 
for some constant C that depends only on Q*, and S p is the function given by 



5 p (e) =supQ(t) VS(t) 



with for any t, Q (i) < u iff P*(if p (Yi) > u) < t. Here, £f p is an envelope function of 
{In (s* + 1) — In (s* + s p ) , || \ft — y^|| < cr, t G <S P }. Taking i/ p = L p one gets easily 



Q (t) < In 1 



tb p \/2'K 



so that 5 P (e) < sup t<e /i p (t) with 



/i p (t) = In 1 + 



tbpv2ir 



y/t + C(t - tint). 



The variations of /i p imply that there exists a universal constant b such that as soon as 
bp < b, h(t) is increasing on (0, 1), so that 



with 



S p (e A 1) = hp (e A 1) < ft p (e A 1) 



h p (t) = Cm j Vt\\nt\ (v^M A l) , 



for some universal constant C. Using iMaueis and Michell (|201lh . we get that for some fixed 
constant K, for all u > 0, 



H 2 (u,{Vi, t G 5 P }) < fc> 



3 In 



mM 



In A p + K 



In (k*p) . 
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Using the fact that for all e €]0, 1], 



and since 



du 



we get that for some other fixed constant K and all a > 



V P 0) < -t p (a) 



(C.8) 



with 



i p (a) = \fk*p 



3Wln 



f 



(7 A 1 



■\/ln(fc*p). 



Now, one may use the upper bound (|C.8[) to upper bound e(cr, n), and we get that for some 
universal constant C, 



e (cr, n) < C 



In 



2n 



Then we may set 



4>p O) = -*p (o-) 



i + — ^ 



C^ln 



A 1 



(|C.2p holds, ifj p (x)/x is indeed non increasing, and if <t p is the unique solution of (|C.3|) . we 
obtain that for some constant C depending only on P*, as soon as b p < b, 



C, 



a A p > -k*p 



In n + In 



In A r 



(C.9) 



D Proof of Theorem 4.2 



For simplicity's sake we denote in the following %i c{P) '■= Uloc(fi,7,'P)- Setp — p [(n/ log n) x ^ 2 ^ +1 '{\og n) 4,3 /( 2,3+] 
with po > fixed which we shall determine later, b p — bo(\ogp) 2 /p for some positive bo and 
A p = ao \ log6 p | for some positive ciq. The approximating f p G J- p 

p 



is taken from iKruiier et al.l (|2010D . Let i* denote the 7-th derivative of log/*. A simple 
modification in the proof of Lemma 4 of IKruiier et al. ( 2010f) gives that for any H and any 
H with H > H + 3/3, there exists B such that if 

Dp := {y : r(y - m) > bf , \l){y -m)\< Bb~*\ log P p/ 2 , 3 < /3, 

\L(y -m)\< Bb-P\logp\-P /2 , VO < m < 2mj| 

then, for all y g D p and all < m < m*. 

- m) = P(y - m)(l + 0(i?(y - m)6g)) + 0((1 + - m))6f-*), (D.l) 
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where the function R(y) is a linear combination of L(y) and of the functions \£j{y)\^^, 
j < j3, and where the constants entering the terms 0{.) depend on %z oc (/3), H and H. 
Note that since the functions /* are bounded by polynomials, there exists a constant C such 
that \R(y - m)\ < C(l + R(y)), VO < m < 2m*. In the f ollowing we fix H > 4/3 + 2 7 and 
> H + 3/3. Moreover, Lemma 4 in Kruiier et al. ( 2010() implies 



K(f\ f P ) <b 2 /, / /* (log /* - log f P f {y)dy 



< 



(D.2) 



Here and further, < will denote an upper bound up to a constant, where the constant 
entering the upper bound depends only on T-Li oc (f3). Throughout the proof C denotes a 
generic constant depending only on Hi oc {(3) and Q*. 

First of all, with such choices of p, b p , and A p , using Theorem l4 . 1 1 and (|D.2|) . there remains 

to prove that E P ± [V p ] < b 2 ^ or equivalently v n E P * [V p ] < 1 with v n = nWT (logn)~ 6 ^/ (2,3+1 ). 
For any 9 and any y, set 



Y, k j =i^(j)fp(y- m j) \ 
E)liM*(i)/p(y-^)y 

(\y- m*\ +A p )\m j -m*\ 



First note that 



log 



w P (0, y) = log 



< max 

3 



b 2 
v 



(D.3) 



Thus we can bound 

n 

^2w p <fi,Yi)i.. g _ 



n 



d\\>M y/logn/n 



< — max > hp* 
n j f-f 

u„logp u„ 



| e - e 1 1 > M y^ogn/ri 



' (l^-m*! + A p )|m J --mj| |Aj — I ' 
62 u* 



< 



b p sfn y/n 



\\e-0\\ > Mo^logn/n 



by Theorem l3.2l and choosing Mq = 1/y/c*. 

Set now Hi > 3 + 2/3, = n {|y| < Hi ^(l/fc^r" 1 } and C p , 2 = D£ n {|y| > 
-Hi log(l/6p)T~ 1 }. Using (|D.3I) we get, for all * = !,••• ,n, 



< 



< 



Qogp) 



3/2 



(l gp)3/2 



s*{y)dy 



C P ,i 



62^ p 



as s oon as 7 > (3/2 — /3 )-i_, where the last inequality comes from an adaptation of Lemma 
2 in Kruiier et al. ( 2010) , using the moment conditions (|4.2[) . We also have 



11c JYi)wJ9,Yi)!,, 21,^ A t 

u r>" P K ' \\9 — e||<Mo-y'log n/n 



< 



< 



\y\s*{y)dy 



(logp) 3 / 2 



(D.4) 



since ifi > 3 + 2/3, where the last inequality comes from the tail condition (|4.2j) . There thus 
remains to prove that 



v n E v * 



1 n 

-J2 W p( 9 n,Yi)U Dp (Yi)% 



8-8\\<M y/logn/n 



< 1. 
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We shall use: 



v n E r 



1 " 



e-e||<M ^/logn/n 



< 



/■+°° ( v n 
J P * ( ^X>p(^,^)1K 0^)11 



||e-e||<M ^/log n/n 



> x da;. (D.5) 



Notice now that 
\ 1=1 

P* fvn||6n - <?1 > M(x))+¥* f sup 



> X \ < 



*/n\\9-8*\\<M(x)AM y/logn/n 



J=& n ( Wp (6,-)K Dp (-))>- 



as soon as 



If moreover 



v»Ki 



sup 

Vn\\9-8*\\<M(x)AMoVTogn 



Ev* sup G n (w p (9,-)l Dp (-)) < - 

\^n\\6-8*\\<M(x)/\M ov /\ogn J 4 



(D.6) 



(D.7) 



where K\ is defined in Lemma [3l Appendix [Cj using Theorem 13.21 and Lemma [3] we get, for 
large enough x, with M{x) = a; 1 / 4 , 



||e-e||<M ov /logn/n 



> a; < 



2 exp 



v n C n (x) 



2 exp ( 



HI 



with 



t„(x) 2 = 16C* 2 sup 

V" 1 1 e - 9 * 1 1 < M ( 21 ) A M o \A°gn 



V w ™ r «(a;) 2 
^[^(0,^)11^(^1) 



8 exp ^— c*s 



.1/2 



C n (x) = AC 2 C*c (W n ,p, s ) C (W w ) , 



where W n , P ,x is such that 



sup <W W (-). 

v / "ll^-S*||<M(x)AM VTogn 



For instance we may take 



\/nO p 

M(x) 



leading, by choosing c(W n , p , x ) = C ^ b 2^ gn , to 

A x 1 / 2 
Cn (x)=C^ 



y/nbl log n ' 



(D.8) 



For any 9 set 



k* k* 

s P,e(y) = X! f*U)My - m j) and s e(y) = XI v(j).f*(y - mj). 

3=1 3=1 
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We consider the following decomposition, 



log 



s P ,e*(y) 



= log 



4(y) 



log 



s*(y) 



log 



s P,e*(y) 



(D.9) 



The first and third terms of (|D.9p are treated similarly. (|D.1|) gives that for any 0, over D p , 

'spAvY 



log 



4(y) 



(D.10) 



For the second term, since /* € Hi oc (j3,'y,V) with /3 > 1/2, 

| log/*(y - m,) - log/*(y - rn*)\ < L(y - m$)\m s - m*f M . 
Moreover, if y G D p , and \/n\\0 — 9*\\ < M(x) A Mo^logn, then for large enough n, 

L(y - mfilmj - m^ K1 < 1 



so that we have, for 9 such that \/n\\9 — 9*\\ < M(x) A M^^/log n, over D p , for large enough 
n, 



3 ^ 3 



< 



M(x 



^ + (n-VaM(x))^ £ Lfo - m*). (D.ll) 



Thus, using the fact that (3 > 1/2, for large enough x, 

sup ^[^^ Fi)]1 ^ (yi)] = {M{xfbf), 

Vn\\6-9*\\<M(x) AMovTogn 

(|D.8P and (|D.12[) give that, for all /3 > 1/2, for large enough x, 



(D.12) 



n.r 



so that for large enough a; 



> x l ' 2 n s ' 2 bf + 2 > ar 1 /2 (logn )3(20+2)/(2^+i) ) 



as soon as (|D.6|) and (|D.7I) hold for large enough 



>x) < exp (-Ca; 1/2 ) (D.13) 



We now prove (|D.6|) . 

Ep.[w p (9,Y l )1L Dp (Yi)]= f (s*(y)-s p , e *(y)) log (^fr)dy-K( Spt9 ,,s Pt g) 

Jd p \ s p,e*(y)J 

SpAv) 



< / {s*(y)-s Pt e*(y))lo, 

Moreover, (jD.ll) and (|D.10[) give that 

' s P,e(y) 



Sp,e*{y) 



dy. 



\s*(y) - s p ,9*(y)\ 



log 



s* e (y) 



dy 



1/2 



<i4h(s*,s p ,g,)[ \R(y)\ 2 (s*(y) + Sp ^(y))dy) <bf 
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using pX2|) . Also, (|D?T|) and (lETTTj) give that for 6> such that ^\\0-9*\\ < M(x)AM a ^/Togn, 



\s*(y)-s P ,e*(y)\ 



log 



sj(v) 
s*(y) 



dy < b 2 /M(xf M 



so that for /3 > 1/2, uniformly for 9 such that y/n\\6 - 9*\\ < M(x) A M ^/\ogn, 

E r *[vi v (0,Y l )l. D9 {Y l )] = 0{M{x)bf) 



and (|D.6I) holds for large enough x. 

To Prove (|D.7[) we use dESJ). We first control 



E P . I _ sup G n (]l Dp log(s Pi e/4)) 

\ yfn\\e-6*\\<M(x)AM ^/logn , 



Using ([D.lOp , we can bound on D p , 

log^(y)) < \R(y)\bP < (logp)- 1 ^ < i 



for n large enough, uniformly over y/n\\9 — 6*\\ < M(x) A M ^\og n. Also, H/H 2 g < / / 2 (1 + 
l°g + l/l)(y)^y ^5 ll/lll) f° r an Y / m the fo rm l°g( s p,e/ s 0)- We denote 



with 



5„, p ,i(ct,x) = {log (s p , e /4) , Vn||« - < M(a:) A M Vlogn, II log(s p , 9 /sJ)|| 2 < ct}. 
Then for all ?/ g D p , since \y \ < A p , and for all \rrij — m'A < r/, 



f P {y ~ m'j) = n<Pb p (y - rrij - a t )e 



;=i 


(lvl+n 






<f P (y- 


mj-)e 






<f P (y- 




:= fu(y 


- m j) 


>f P (y- 






h(y- 



and 



0A1 , 



/* (y - m'i ) < /* (y - m^e" sup ' — *'<" 



. K(»-m)| 



> f*(y- mj )e 



—V SU P|m-,i 



\t(y-m)\ 



(D.14) 



where £(y — to) = l\(y — in) if /3 > 1 and £(y — m) = L(y — rrij) if /3 < 1, so that a bracket 
for log(s P) e'/ s *^')^ is given on D p by 



( 3A P T) 



if* 1 1 sup |/(y-m)| ] +log(l + 7 7 £>(j)- 1 



\ P \m-mj\<ri / J=1 



;/ sup |l"(y-m)| +log(l-ry^/i(j) 1 ), 



i\<V 



3=1 



Thus if u > and r/ < r\^(v}^ 2 b 2 p j A p A u*^ 1 )/ 2 ) with 770 > small enough, 

(Upfi - L p , e ) 2 (y)s*(y)dy < u, 
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so that 



<pi(ir) < (70og + (l/(7) + log(nM(x)). 
Moreover for all \\9 - 8*\\ < M ^\ogn/y/n, (fDTT|) implies that 

||iog(W4)ll2<&fc. 



Therefore using Theorem 2 of iDoukhan et al.l (|1995f ) and the fact that the chain is geomet- 
rically ergodic, we obtain that 



sup 

^/n\\6-6*\\<M(x)/\M ^/Togn 



for x > 1 and large enough n. We now study 



Ep* sup G n 

\^/n\e-9*\\<M(x)AMoV^ogn 



G n (H Dp ]og(s p , e /s* e )) < b?{\ogn + logM(x)) 



< xl 



^D p fog 



s*(y) 



Using (lD~14l) . if ^|6» - 0*|| < M(x) A M \/Iogn 



M | < V ( " A1) V^/V^ = o(i), 



so that 



log 



4(v) 

s*(y) 



< 



2.13 



log 



ge(j/) 

s*(y) 



(D.15) 



< max^j/^ - l) 2 + max / s*(y)(log /*(?/ - m 3 ) - log f*(y - m*)) 2 dy 

< (M(x) 2 /nf A1 . 



Hence using the same tricks as before and applying Theorem 2 of IDoukhan et al.l (]1995l ) we 
obtain that for large enough n, 



Ep* _ sup G n (H Dp \og(s p ,o/s* e )) \ <M{x)n^ M V 2 ^g~n~ = o(x^i/v n ) 

\^/n\\9-e*\\<M(x)AMoVTogn J 

(D.16) 



for all x and (ID.7P is satisfied. 

Finally, (ID.13|) holds, which, together with (|D.5[) ends the proof of Theorem 14.2 
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